<!DOCTYPE html>
<HTML lang="en">
<HEAD>
    <title> Using CSV files in Baby X </title>
    <meta charset="UTF-8">
        
    <link href="prism.css" rel="stylesheet">
<script src="microlight.js"> </script>
<script src="prism.js"> </script>
<style>
.microlight {
    font-family : monospace;
    white-space : pre;
    background-color : white;
}
    
BODY {
    width:50em;
    margin-left:5em;
    background-color:#c0c0ff;
}

P {
   width : 50em;
}

pre {
   background-color:white;
}
</style>
</HEAD>





<BODY>
    <script src="prism.js"></script>
    <A href ="https://github.com/MalcolmMcLean"> <IMG src = "Cartoon_Owl_clip_art.png" alt="Malcolm's github site">  </A>
    <A href ="index.html"> <IMG src = "babyxrc_logo.svg" alt="Baby X Resource compiler" width = "64" height = "62">  </A>
    &nbsp;&nbsp;
    <IMG src = "babyxrc-banner.svg" width = "256" height = "62" alt = "Baby X RC banner">
<H1> Using CSV files in Baby X </H1>
<P>
The Baby X resource compiler comes with a powerful CSV parser and tools
for integrating CSV data into Baby X programs. You can also use the Baby X
resource compiler to create data which can be included in other programs.  
</P> 
<H3> What is CSV </H3>
<P>
CSV stand for "comma-separated values" files. At its heart it is the 
simplest of formats. A CSV file is simply a write out of data, with one 
record per line, and the fileds separated by commas.
 </P>
<pre>
Name, Salary, Payroll ID
Tom, 32000.00, 123
Dick, 24000.00, 124
Harry, 50000.00, 125
</pre>
<P>
That's it. The simplicity and the human-readbility of this format has made 
it a favourite for storing high value data
</P>
<P>
CSV files represent what computer scientists call a "dataframe". A 
dataframe is simply an array of records, with all the rows in the same 
format and the columns representing fields. So in this case we have three
records (for the three employees),and three fields, name, salary, and 
payroll id. Dataframes are naturally represented in C programs as arrays 
of structs.  
</P>
<P>
The simplicity of CSV is a bit of an illusion. Real datasets often have 
missing data. And string data can get quite long and contain embedded 
quotation marks or newlines. There are also undesirable variations on the 
basic format, such as the use of tabs insead of commas as the separator, 
or the use of commas instead of decimal points. There is also the problem 
that some CSVs have a header line and some do not.
 </P>
<H3> The CSV file parser </H3>
<P>
The CSV file parser which comes with the Baby X resource compiler is 
powerful and easy to use. It will load most CSV files.
</P>
<pre><code class="language-c">
#define CSV_NULL 0       /* null data */
#define CSV_REAL 1       /* floating-point data */
#define CSV_STRING 2     /* string data */
#define CSV_BOOL 3       /* boolean data */

typedef union
{
  double x;        /* real value   */
  char *str;       /* string value */
} CSV_VALUE;

typedef struct
{
  char **names;     /* column names (can be NULL) */
  int *types;       /* type of column CSV_REAL, CSV_STRING etc */
  int width;        /* number of columns */
  int height;       /* number of row (excl header) */
  CSV_VALUE *data;  /* data in row-major format */
} CSV;

CSV *loadcsv(const char *fname);
void killcsv(CSV *csv);
void csv_getsize(CSV *csv, int *width, int *height);
int csv_hasdata(CSV *csv, int col, int row);
double csv_get(CSV *csv, int col, int row);
const char *csv_getstr(CSV *csv, int col, int row);
int csv_hasheader(CSV *csv);
const char *csv_column(CSV *csv, int col, int *type);

</code></pre>
<P>
  The structures are supposed to be opaque but exposed to calling code to 
make troubleshooting easier. However the CVS_xxxx type identifiers are 
needed for querying column types.
</P>
<P>
Sometimes when using CSV files you know what data you are expecting, for 
example for a payroll program,and sometimes you do not, for example when 
implementing a statistical package. After the CSV file is loaded, you can 
quickly query the width (the number of columns, or fields), and the column 
names and types, to ensure that you have data in the format you were
expecting. Or you can use that information to buld dynamic structures at 
run time, which is more difficult.
 </P> 
<H3> Example programs </H3>

<pre><code class="language-c">
   int main(int argc, char **argv)
   {
      CSV *csv;
      int width, height;
      int i, ii;
      int type;
      const char *fieldname;
      double total = 0.0;
      int N = 0;

      csv = loadcsv(argv[1]);
      if (!csv)
      {
         printf("Can't load CSV file %s\n", argv[1]);
         exit(EXIT_FAILURE);
      }
      if (!csv_hasheader(csv))
      {
         printf("CSV file must have a header\n");
         exit(EXIT_FAILURE);
      }
      csv_getsize(csv, &width, &height);

      for (i = 0; i &lt; width; i++)
      {
         fieldname = csv_column(csv, i, &type);
         if (strcmp(fieldname, argv[2]))
         {
             if (type != CSV_REAL)
             {
                printf("Can only take mean of numerical data\n);
                exit(EXIT_FAILURE);
             }
             for (ii =0; ii &lt; height; ii++)
             {
                if (csv_hasdata(csv, i, ii))
                {
                   total += csv_get(csv, i, ii);
                   N++;
                }
             }
             printf("The mean of %s is %f\n", argv[2], total / N);
             break;
         }
      }
      if (i == width)
      {
          printf("Can't find field %s\n", argv[2]);
          exit(EXIT_FAILURE);
      }

      killcsv(csv);
      return 0;
   }
</code></pre>

<P>
Here's a program to take a CSV file and a field name, and calculate the 
mean of that field.
</P>
</BODY>
</HTML>
